Scatteract: Automated Extraction of Data from Scatter Plots

نویسندگان

  • Mathieu Cliche
  • David Rosenberg
  • Dhruv Madeka
  • Connie Yee
چکیده

Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Facilitating systematic reviews, data extraction and metaanalysis with the metagear package for r

1. The R package ecosystem is rich in tools for the statistics of meta-analysis. However, there are few resources available to facilitate research synthesis as a whole. 2. Here, I present the METAGEAR package for R. It is a comprehensive, multifunctional toolbox with capabilities aimed to cover much of the research synthesis taxonomy: from applying a systematic review approach to objectively as...

متن کامل

تحلیل ممیز غیرپارامتریک بهبودیافته برای دسته‌بندی تصاویر ابرطیفی با نمونه آموزشی محدود

Feature extraction performs an important role in improving hyperspectral image classification. Compared with parametric methods, nonparametric feature extraction methods have better performance when classes have no normal distribution. Besides, these methods can extract more features than what parametric feature extraction methods do. Nonparametric feature extraction methods use nonparametric s...

متن کامل

Visualizing Multi-Dimensional Data

High dimensional data visualization is very important in data analysts since it gives a direct and natural view of data. In this paper, we propose a method to visualize large amount of high dimensional data in a 3-D space. In our method, we divide the high dimension data into several groups of lower dimensional data first. Then, we use different icons to represent different groups. Initial expe...

متن کامل

Generalized scatter plots

' Corresponding author. Abstract Scatter Plots are one of the most powerful and most widely used techniques for visual data exploration. A well-known problem is that scatter plots often have a high degree of overlap, which may occlude a significant portion of the data values shown. In this paper, we propose th e generalized scatter plot technique, which allows an overlap-free representation of ...

متن کامل

Validity of Selected WBC Differentiation Flags in Sysmex XT-1800i

Background: Automatic Cell Counter devises make the CBC differential very easy and delivering the results in few second. However, the problem with this device is facing a flag requires a time-consuming microscopic review of the specimen which causes unacceptable wait times for patient as well as costs for laboratories. In this study, we calculated the validity of WBC d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017